Model Selection

GRPO Optimization

# GRPO Optimization

INTELLECT-2 is a large language model with 32 billion parameters released by Prime Intellect. It is built on the Qwen2 architecture and focuses on mathematical, coding, and logical reasoning tasks.

Large Language Model

Qwen3 8B Grpo Medmcqa

A fine-tuned version based on Qwen/Qwen3-8B using the medmcqa-grpo dataset, specialized in medical multiple-choice question answering tasks

Large Language Model

E1-Math-1.5B is a language model fine-tuned based on DeepSeek-R1-Distilled-Qwen-1.5B, supporting elastic reasoning and the GRPO method, suitable for budget-constrained deduction scenarios.

Large Language Model

Xiyansql QwenCoder 32B 2504

XiYanSQL-QwenCoder-2504 is the latest SQL generation model, combining fine-tuning and GRPO training, supporting multiple dialects with efficient and accurate SQL generation capabilities.

Large Language Model

Safetensors Supports Multiple Languages

TBAC VLR1 3B Preview

A multimodal language model fine-tuned by Tencent PCG Basic Algorithm Center, optimized based on Qwen2.5-VL-3B-Instruct, achieving state-of-the-art performance in multiple multimodal reasoning benchmarks among models of the same scale

Safetensors English

Yehia 7B Preview

Yehia is an empathetic Arabic language model, developed based on ALLaM-7B-Instruct-preview, offering considerate, friendly, and practical Arabic and English conversations.

Large Language Model

Transformers Supports Multiple Languages

Qwen GLOCON Reasoning

A reinforcement learning model based on Qwen2.5-3B-Instruct, specifically designed for conflict event classification, optimized using the GRPO method for multi-reward signals and structured reasoning formats.

Large Language Model English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase